31 research outputs found

    RELEASE: A High-level Paradigm for Reliable Large-scale Server Software

    Get PDF
    Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene

    Autonomous mobility in multilevel networks

    Get PDF
    Autonomous Mobile Programs (AMPs) are mobile agents that are aware of their resource needs and sensitive to the execution environment. AMPs are unusual in that, instead of using some external load management system, each AMP periodically recalculates network and program parameters and independently moves to a new location if it provides a better execution environment. Dynamic load management emerges from the behaviour of collections of AMPs. AMPs have previously been measured using mobile languages like Java Voyager on local area networks (LANs). The thesis develops an accurate simulation for AMPs on networks and validates it by reproducing the behaviour of collections of AMPs on homogeneous and heterogeneous LANs. The analysis shows that AMPs exhibit thrashing like other distributed load balancers. This thrashing is investigated in collections of AMPs, and two types of redundant movement (greedy effect) are identified. The thesis explores the extent of greedy effects by simulating collections of AMPs, and proposes negotiating AMPs (NAMPs) to ameliorate the problem. The design of AMPs with a competitive negotiation scheme (cNAMPs) is presented, followed by a performance comparison AMPs and cNAMPs using simulation. To estimate the significance of the greedy effects the properties of balanced states are established, such as independent balance, singleton optimality, and consecutive optimality. The balanced states are characterised for homogeneous and heterogeneous networks where AMPs are analysed as the general case. The significance of the cNAMP greedy effect is established by conducting a worst case analysis of redundant movements, and the maximum number, and probability of, redundant movements are calculated for homogeneous and heterogeneous networks. One of three theorems proves that in a heterogeneous network of q subnetworks the number of redundant movements does not exceed q − 1. i The thesis proposes and evaluates a multilevel cNAMP architecture that abstracts over network topologies to effectively distribute cNAMPs in large networks. The thesis investigates alternatives for implementation of this multilevel architecture and proposes a fusion-based scheme where information is first available to neighbour nodes. These neighbour nodes modify the information and pass it to remote locations. The effectiveness of the scheme is evaluated by simulating networks with up to five levels, varying the number of locations from 5 to 336, and the number of cNAMPs from 8 to 3360. The experiments investigate the effects depending on the number of levels, topologies, number of locations, number of cNAMPs, work of cNAMPs, type of cNAMPs, speed of locations, and type of rebalancing. The architecture is found to be effective because it delivers performance close to the hypothetical, e.g. each additional level increases mean cNAMP completion time by just 2%

    Performance Portability Through Semi-explicit Placement in Distributed Erlang

    Get PDF
    We consider the problem of adapting distributed Erlang applications to large or heterogeneous architectures to achieve good performance in a portable way. In many architectures, and especially large architectures, the communication latency between pairs of virtual machines (nodes) is no longer uniform. We propose two language-level methods that enable programs to automatically adapt to heterogeneity and non-uniform communication latencies, and both provide information enabling a program to identify an appropriate node when spawning a process. We provide a means of recording node attributes describing the hardware and software capabilities of nodes, and mechanisms that allow an application to examine the attributes of remote nodes. We provide an abstraction of communication distances that enables an application to select nodes to facilitate efficient communication. We have developed open source libraries that implement these ideas. We show that the use of attributes for node selection can lead to significant performance improvements if different components of the application have different processing requirements. We report a detailed empirical investigation of non-uniform communication times in several representative architectures, and show that our abstract model provides a good description of the hierarchy of communication times

    A scalable reliable instant messenger using the SD Erlang libraries

    Get PDF
    Erlang has world leading reliability capabilities, but while it scales extremely well within a single node, distributed Erlang has some scalability issues. The Scalable Distributed (SD) Erlang libraries have been designed to address the scalability limitations while preserving the reliability model, and shown to deliver significant performance benefits above 40 hosts using some relatively simple benchmarks. This paper compares the reliability and scalability of SD Erlang and distributed Erlang using an Instant Messaging (IM) server benchmark that is a far more typical Erlang application; a relatively large and sophisticated benchmark; has throughput as the key performance metric; and uses non-trivial reliability mechanisms. We provide a careful reliability evaluation using chaos monkey. The key performance results consider scenarios with and without failures on up to 17 server hosts (272 cores). We show that SD Erlang adds no performance overhead when all nodes are grouped in a single s_group. However, either adding redundant router nodes in distributed Erlang applications, or dividing a set of nodes into small s_groups in SD Erlang applications, have small negative impact. Both the distributed Erlang and SD Erlang IM tolerate failures and, up to the failure rates measured, the failures have no impact on throughput. The IM implementations show that SD Erlang preserves the distributed Erlang reliability properties and mechanisms

    A Reliable Instant Messenger in Erlang: Design and Evaluation

    Get PDF
    This document describes the design and evaluation of two Erlang-based instant messenger systems using Distributed Erlang (D-Erlang) and Scalable Distributed Erlang (SD-Erlang). The purpose of these systems is to serve as real-world benchmarks to test the performance of the SD Erlang library

    Scalable SD Erlang Reliability Model

    Get PDF
    This technical report presents the work we have conducted to support SD Erlang reliability and to formally specify the semantics of s groups. We have considered the following aspects of SD Erlang reliability: node recovery after failures and s group name uniqueness

    Simulating Autonomous Mobile Programs on Networks

    Get PDF
    Autonomous mobile programs (AMPs) have been proposed for load management in dynamic networks. An AMP is aware of its resource needs and periodically seeks a better location in the network to reduce execution time. AMPs have previously been measured using mobile Java Voyager on local area networks (LANs). We have constructed a simulation model of AMPs and reproduced 4 sets of experiments on homogeneous networks, i.e. networks where all locations have the same processor speed, and 2 sets of experiments on heterogeneous networks with collection of large and small AMPs. The results show that simulated collections of AMPs obtain similar balanced states to those reached in the real experiments, and have only minor differences from real experimental results. The simulation model gives an opportunity to explore the greedy effect that can be observed in the real experiments. This gives us confidence to apply the simulation model for further investigation of AMP behaviour, including behaviours on wide area networks

    Scalable SD Erlang Computation Model

    Get PDF
    The technical report presents implementation of s groups and semi-explicit placement of the Scalable Distributed (SD) Erlang. The implementation is done on the basis of Erlang/OTP 17.4. The source code can be found in https://github.com/release-project/otp/tree/17.4-rebased. We start with a discussion of differences between distributed Erlang global groups and SD Erlang s groups (Chapter 1). Then we discuss the implementation of s groups and the features of sixteen functions that were modified and introduced in global and s group modules (Chapter 2). After that we discuss semi-explicit placement, node attributes and choose node/1 function (Chapter 3). These functions were unit tested (Chapter 4). Finally, we discuss future work (Chapter 5)

    Scalable Persistent Storage for Erlang

    Get PDF
    The many core revolution makes scalability a key property. The RELEASE project aims to improve the scalability of Erlang on emergent commodity architectures with 100,000 cores. Such architectures require scalable and available persistent storage on up to 100 hosts. We enumerate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. This analysis shows that Mnesia and CouchDB are not suitable persistent storage at our target scale, but Dynamo-like NoSQL DataBase Management Systems (DBMSs) such as Cassandra and Riak potentially are. We investigate the current scalability limits of the Riak 1.1.1 NoSQL DBMS in practice on a 100-node cluster. We establish for the first time scientifically the scalability limit of Riak as 60 nodes on the Kalkyl cluster, thereby confirming developer folklore. We show that resources like memory, disk, and network do not limit the scalability of Riak. By instrumenting Erlang/OTP and Riak libraries we identify a specific Riak functionality that limits scalability. We outline how later releases of Riak are refactored to eliminate the scalability bottlenecks. We conclude that Dynamo-style NoSQL DBMSs provide scalable and available persistent storage for Erlang in general, and for our RELEASE target architecture in particular

    Scalable Reliable SD Erlang Design

    Get PDF
    This technical report presents the design of Scalable Distributed (SD) Erlang: a set of language-level changes that aims to enable Distributed Erlang to scale for server applications on commodity hardware with at most 100,000 cores. We cover a number of aspects, specifically anticipated architecture, anticipated failures, scalable data structures, and scalable computation. Other two components that guided us in the design of SD Erlang are design principles and typical Erlang applications. The design principles summarise the type of modifications we aim to allow Erlang scalability. Erlang exemplars help us to identify the main Erlang scalability issues and hypothetically validate the SD Erlang design
    corecore